GPX - Gardens Point XML IR at INEX 2005
نویسنده
چکیده
The INEX 2006 evaluation was based on the Wikipedia collection in XML format. It consisted of several tasks that required different approaches. In this paper we described the approach that we adopted in an attempt to satisfy the requirements of all the tasks, Thorough, Focused, Relevant in Context, and Best in Context. We have used the same underlying system to approach all tasks. The retrieval strategy is based on the construction of a collection sub-tree, consisting of all nodes that contain one or more of the search terms. Nodes containing search terms are then assigned a score using the GPX ranking scheme which incorporates TF-IDF or BM25 variants, but extends them. Scores are propagated upwards in the document XML tree, and finally all XML elements are ranked. We present results that demonstrate that the approach is versatile and produces consistently good performance. We also provide empirical analysis of the GPX ranking scheme and demonstrate its performance against a baseline TF-IDF and a BM25 scoring scheme.
منابع مشابه
Fine Tuning INEX
Since 2002, INEX has been the benchmark for evaluating XML information retrieval (XML-IR) systems. INEX has based much of its evaluation methodology on that of existing workshops, albeit modified for the specific requirements of XML-IR. Due to some of the modifications, the time spent during evaluation phase of INEX takes a lot longer than comparable workshops. Here, we investigate ways to spee...
متن کاملWhat XML-IR Users May Want
It is assumed that by focusing on retrieval at a granularity lower than documents that XML-IR systems will better satisfy users’ information need than traditional IR systems. Participates in INEX’s Ad-hoc track develop XMLIR systems based upon this assumption, using an evaluation methodology in the tradition of Cranfield. However, since the inception of INEX, debate has raged on how applicable ...
متن کاملNLPX at INEX 2005
XML information retrieval (XML-IR) systems aim to provide users with highly exhaustive and highly specific results. To interact with XML-IR systems users must express both their content and structural needs in the form of a structured query. Historically, these structured queries have been formatted using formal languages such as XPath or NEXI. Unfortunately, formal query languages are very com...
متن کاملThe simplest evaluation measures for XML information retrieval that could possibly work
This paper reviews several evaluation measures developed for evaluating XML information retrieval (IR) systems. We argue that these measures, some of which are currently in use by the INitiative for the Evaluation of XML Retrieval (INEX), are complicated, hard to understand, and hard to explain to users of XML IR systems. To show the value of keeping things simple, we report alternative evaluat...
متن کاملEPRUM Metrics and INEX 2005
Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML IR in which retrievable information units are document elements. These units are neither predefined nor independent, and the elements returned by IR systems may overlap and contain near misses. Part of the problem stems from the classical hypotheses on the user behaviour that do not take into account the ...
متن کامل